So I have been pondering various ways to import data into MongoDB lately. While it is possible to import .json files or .csv files directly, they do not carry the type data with them. Mongo's default behavior in this case appears to be to store the data in the "least expensive" format. Thus fields that are intended to be longs may be stored as integers and dates will be stored as strings, etc. If we are using a strongly typed language this can lead to issues when we retrieve the data back out and it is not in a type that we expect.
Many legacy systems may send something like a .csv file with something like:
“Bob”,”Pants”,”07/14/1986”
We would then need a file descriptor of some sort to interpret the file, historically in xml:
Or, perhaps we can try to use the file header to carry the information:
if_name,String”,”l_name,String”,”birthdate,Date,MM/dd/yyyy”
“Bob”,”Pants”,”07/14/1986”
The problem with this approach is that we must update the header or the meta file every time there is a change in the incoming
data and requires custom code to ingest the file and load it into MongoDB.
This is very verbose, but it works well. If we generate a contract with the consuming code on what the $types mean then should be able to safely transmit our JSON files and have our types preserved. The layout of the data should be able to change and we can use JSON libraries to ingest the files and load them into MongoDB and preserve our type information. It will, however, still require some custom ingestion code
what if we could generate a BSON file that could be loaded directly into MongoDB? BSON is a binary JSON specification and is how MongoDB natively stores its data. The mongodump and mongorestore utilities generate and consume the BSON files respectively.
There are several very important things to keep in mind.The index file is interesting because we can not only transmit the type information along with the BSON file but we can pass along the expected indexes as well. However, be very careful when adding indexes to existing collections!
Writing the bson file itself is not very difficult, the java driver includes a BSON encoder out of the box that we can use. Here is an example that uses the BasicBSONEncoder to write out a BSON file:
Here we are using java 7 and the Files helper class to write the encoded MongoDB DBObject to our file an object at a time.We can see that this JSON document is an array of indexes.
Lets take a look at how we can extract information from our MySQL table and carry that index over to our MongoDB collection.
First we will use a wrapper around DBObject to map out the key value pairs in the correct format:
Next we need to obtain the index information from the MySQL database:
This is a rather simplistic approach for pulling the index information from MySQL and more advanced or compound indexes will require additional logic to handle.
Once we have our list of indexes we can pass it along to the meta data file writer:
Now we can create a main class to run all of our classes together and create our import file!
I hope you have enjoyed my ramblings on importing data into MongoDB. All source code found in these examples may be found here
Jai Hirsch
Senior Systems Architect
CARFAX
jai.hirsch@gmail.com